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Dynamic Task Fetching Over Time Varying 
Wireless Channels for Mobile Computing 

Applications 

Aditya Dua, Dimitrios Tsamis, Nicholas Bambos, and Jatinder Pal Singh 

Abstract 

The processing, computation and memory requirements posed by emerging mobile broadband services require 
adaptive memory management and prefetching techniques at the mobile terminals for satisfactory application per- 
formance and sustained device battery lifetime. In this work we investigate a scenario where tasks with varied 
computational requirements are fetched by a mobile device from a central server over an error prone wireless link. We 
examine the buffer dynamics at the mobile terminal and the central server under varying wireless channel connectivity 
and device memory congestion states as variable sizes tasks are executed on the terminal. Our goal is to minimize 
the latency experienced by these tasks while judiciously utilizing the device buffering capability. We use a dynamic 
programming framework to model the optimal prefetching policy. We further propose a) a prefetching algorithm Fetch- 
or-Not (FON), which uses quasi-static assumption on system state to make prefetching decisions, and b) a prefetching 
policy RFON, which uses randomized approximation to the optimal solution thus obviating the need for dynamic 
online optimization and substantially reducing the computational complexity. Through performance evaluation under 
slow and fast fading scenarios we show that proposed algorithms come close to performance of the optimal scheme. 

I. Introduction 

The advent of portable devices with wireless communication capability (e.g., PDAs, mobile phones) has provided 
great impetus to mobile computing applications. A broad spectrum of wireless broadband services are being 
offered to billions of users across the globe today. Some of these include location based services, streaming of 
compressed media (e.g. video) to mobile users, distributed execution of parallelizable computational tasks over 
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multiple cooperating mobile devices, etc. All these applications are executed on mobile terminals with limited 
processing power, battery life, and memory. Moreover, these terminals communicate with a central server/controller 
over error prone wireless links with fluctuating quality. Intelligent resource management and robust adaptation to 
variations in the wireless environment are therefore essential for optimizing the performance of mobile computing 
applications. 

This paper focuses on dynamic task prefetching, adaptive processing, and memory management at mobile 
terminals (MT). Broadly speaking, our goal is to minimize the latency experienced by computational tasks, while 
judiciously utilizing the scarce memory resources available at the MT. More specifically, we are interested in a 
mobile computing scenario, where the MT sequentially fetches computational tasks from a central server (CS) over 
an error prone wireless link. It takes a random amount of time to transmit the task from the CS to the MT due 
to fluctuations in wireless channel quality. Further, it takes a random amount of time to complete the execution of 
each task at the MT due to resource contention with other competing tasks under execution at the MT. 

The problem of joint buffer management and power control was addressed by Gitzenis and Bambos [2] in the 
context of client/server interaction for predictive caching. The focus of the work was to prefetch data over varying 
wireless channel using appropriate power levels and unlike the present work dynamic execution of application tasks 
and their server-end buffering was not addressed. The power aware prefetching problem was also studied by Cao 
in [3]. Dua and Bambos [4] examined buffer management for wireless media streaming, where the objective was to 
minimize buffer underflows to ensure smooth media playout, at the same time using the limited buffer at the mobile 
terminal in a careful manner. Buffer management for media streaming was also studied by Kalman et al. [5], Li et 
al. [6], etc. In other work on memory management in mobile computing scenarios, Ip et al. [7] proposed an adaptive 
buffer control scheme to prevent buffer overflows at MTs in a real-time object-oriented computing environment, 
and Yokoyama et al. [8] proposed a memory management architecture for implementing shared memory between 
the CS and the MT. To the best of our knowledge, the latency vs. buffer tradeoff in the mobile computing scenario 
delineated in this paper has not been addressed in the existing literature. 
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Fig. 1. System Model 

For convenience, we will refer to a set of related tasks as an application. An application could be stand-alone, 
involving only the CS and the MT, or could also be part of a larger distributed computation involving multiple MTs. 
From the applications perspective, the best strategy is clearly to buffer all the tasks at the MT as quickly as possible. 
However, memory is a limited and an expensive resource at MTs and is shared by several applications (each one 
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with its own set of tasks) which are executed concurrently. These applications could comprise of computational tasks 
fetched from a different CS or neighboring MT(s), or could also be locally generated system specific processes. 

The MT needs to be "smart" in terms of the number of tasks it buffers locally for each application, because 
allocating a large chunk of memory to one application to improve its performance is likely to hurt the performance 
of other applications. From this perspective, the MT should request a new task from the CS as conservatively as 
possible. An exact analysis of the tradeoffs involved in this situation would involve modeling the dynamics of each 
application individually and considering the interactions induced between them by the shared memory resource. 
This holistic approach, however, is cumbersome (both analytically and computationally) and also not scalable. An 
alternative approach, which we adopt here, is to focus on the dynamics of one application (chosen arbitrarily) 
and model other applications as "background congestion" for this foreground application, and capture the coupling 
between them through a minimal set of parameters. 

Since the foreground and background applications share a common limited resource — the memory — the 
background applications create congestion for the foreground application. This effect can be captured through a 
congestion cost. If there were no background applications, the congestion cost would be and the entire memory 
could be dedicated to the foreground application. On the other hand, if there were a large number of background 
applications, the congestion cost would be quite large, and the foreground application would be allocated only a 
small chunk of the memory. 

Based on the foregoing discussion, the dilemma faced by the MT in each decision epoch is the following: Fetch 
a task from the CS and possibly incur an additional congestion cost or not fetch a task and possibly increase the 
latency experienced by the application. To fetch or not to fetch? 

We address the above trade-offs by modeling the problem within a dynamic programming framework [9]. We 
begin by the outlining the system model and the structural properties of the dynamic programming formulation in 
Section EL We discuss a special and useful instance of the optimization problem by employing a simple wireless 
channel and mobile terminal congestion models and establish the optimality of a switchover policy [10] in Section 
EH. We then discuss in Sections IV and V, algorithms that use quasi-static and/or randomized approximations to the 
optimal solution. We evaluate the performance of the proposed prefetching algorithms in Section VI and conclude 
the work in Section VII. 

II. System Model 

The mobile computing scenario we are interested in studying can be abstracted as a controlled two-queue tandem 
network, as depicted in Fig. 1. We assume that time is divided into identical time slots. In the figure, Qi represents 
the queue at the central server (CS) and Q2 represents the queue at the mobile terminal (MT). The MT is interested 
in running an application assigned to it by the CS. An application comprises of a sequence of related tasks to be 
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executed by the MT. A task includes a set of instructions to be processed by the MT, along with any auxiliary data 
that may be needed to process the instructions. For convenience, we assume that each task can be encoded in a 
single network packet to be transmitted over the air. We will therefore use the words task and packet interchangeably 
throughout the paper. 

We note that the focus of this paper is on the problem formulation aspects and parsimonious mathematical 
modeling of a complex stochastic system. The two queue tandem formulation presented here is novel in the context 
of mobile computing scenarios. Also, the two queue tandem is a very hard control problem to analyze, as is evident 
from [11 ] — [ 13] and other similar literature. Our work is a first attempt to develop systematic approximations to the 
optimal control policies associated with this class of problems. We leverag these approximations and other provable 
properties of the model to develop practical algorithms. 

A. Fluctuating wireless channel 

At most one packet can be transmitted over the fluctuating wireless communication link from the CS to the MT 
in a time slot. At the beginning of every time slot, the MT has the option of fetching a task from the CS over 
this error-prone wireless link. We model the variations in wireless channel reliability with a time homogeneous 
finite state Markov chain (FSMC). This model has been widely employed in the wireless communications and 
networking literature (see [14] and references therein). An FSMC channel model is characterized by a state-space 
J = {1, . . . , J} and an associated J x J state-transition matrix P. Channel state transitions are assumed to occur 
at the end of each time slot. In particular, if the channel state at the beginning of the current time slot is j, it will 
transition to j' at the end of the current time slot with probability (w.p.) P(j,j'). With the FSMC, we associate a 
mapping s : J i— » [0, 1], such that s(j) denotes the probability of successful packet transmission if the MT chooses 
to fetch a packet from the CS when the channel is in state j. 

B. Time varying congestion at the MT 

We also model the state of the processor at the MT as an FSMC. The FSMC is characterized by a state-space 
A4 = {1, . . . , M} and a state-transition matrix Q. Associated with the Markov chain is a mapping /i : M. i— ► (0, 1], 
such that the expected time to process a task when the processor is in state m is l//z(m) time slots. In other words, 
the task execution time in state m is a geometrically distributed random variable (R.V.) with parameter fj,(m). This 
model captures the randomness in task execution times at the MT, which can be attributed to two reasons: 

1) Variable sized tasks*, 

2) Contention for the shared processor at the MT (shared with tasks from other applications being executed at 
the MT). 

"The "size" of a task refers to the computational resources it requires. Two tasks encoded in identical sized network packets can have very 
different computational requirements. 
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C. System state and costs 

Given the above mathematical model, the two queue tandem network can be completely described at any instant 
by the backlogs of the two queues, the channel state, and the processor state. More formally, let x = (61, 62, j, to) 
denote the state of the system, where b\ is the number of remaining tasks at the CS (in queue Qi) for the application 
of interest (foreground application), 62 is the number of tasks waiting to be processed at the MT (in queue Q2), 
j € J is the state of the wireless link from the CS to the MT (associated with probability of successful transmission 
s(j)), and m £ M is the state of the processor at the MT (associated with average task execution time l//i(m)). 
Also, for ease of notation, define b = (61,62), so that x = (b,j, to). 

A cost of 1 unit per task is incurred for every time slot that a task spends waiting in Qi, and a cost of c > 1 
units per task is incurred for every time slot that a task spends in Q.2- 

Remark 1 (c captures the congestion vs. latency tradeoff): The parameter c represents the congestion cost as 
experienced by the foreground application at the MT. If c is small, the MT is likely to fetch tasks from the 
CS as quickly as possible in order to reduce the overall latency experienced by the application. On the other hand, 
if c is large, the MT is unlikely to fetch a task from the CS until its local buffer Q2 is empty, thereby resulting in 
higher latency for the application. The parameter c therefore captures the tradeoff between the congestion cost and 
the latency cost. More importantly, c captures the coupling between the background and foreground applications 
in a parsimonious fashion, without explicitly modeling the former. A well chosen value for c ensures that the MT 
requests tasks judiciously from the the CS — infrequently enough to prevent buffer overflow at the MT (resulting 
in potential disruption of other applications competing for the shared MT resources) and frequently enough to 
prevent buffer underflows (resulting in potential processor under-utilization). Thus, c is a critical design parameter 
and determines the operating point of the system. We will examine system performance as a function of c via 
simulations in Section VI. 

Remark 2 (Buffering at the CS is not free): Typically, availability of memory at the CS will not be a bottleneck 
in a real system. Then why should tasks queued at the CS incur a buffering cost of 1 unit per time slot in our 
model? This cost creates a backlog pressure, which drives down the overall latency experienced by the foreground 
application. To see this argument clearly, consider a scenario where tasks queued at the CS do not incur a backlog 
cost. Clearly then, in order to minimize overall buffering costs, the MT will request a new task only when it has 
finished processing the currently executing task. Since packet transmission times over the wireless link are random, 
the consequence would be potential under-utilization of the processor at the MT (especially in the absence of 
competing background applications), which is bad from an application latency perspective. A non-zero buffering 
cost at the CS prevents such a scenario from arising. If the MT requests tasks from the CS too infrequently in 
order to reduce buffering costs, the backlog pressure at the CS starts building up, which eventually forces the MT 
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New state 


Transition probability 


(61,62 -l,i',m') 


n(m)P(j,j')Q(m,m') 


(61,62, j',m') 


{l-fx(m)]P(j,j')Q(m,m') 



TABLE I 

Possible state transitions and associated transition probabilities if a policy tt selects action FE in system state 

(bl,b2,j,m), ASSUMING b2 > 0. 



New state 


Transition probability 


(61 -1, 62, j',m') 


s(j)n(m)P{j,j')Q(m,m') 


(61,62 -l,j',m') 


s(j)n(m)P(j,j')Q(m,m') 


(61- i,6 2 + U',m') 


s(j)[l - u(m)}P(j,j')Q(m,m') 


(61, 6 2 , j',m') 


[l-s(j)][l-(,(m)}P(j,j')Q(m,m') 



TABLE II 

Possible state transitions and associated transition probabilities if a policy tt selects action FE in system state 

(&l,&2,i,m), ASSUMING 61, b 2 > 0. 



to request a task, and thereby drives down the application latency cost. Since any non-zero holding cost at the CS 
will suffice for generating the requisite backlog pressure, we set it to a normalized value of 1, without any loss of 
generality. 

D. Actions and system dynamics 

In our model, we assume that given the initial system state at time t = 0, no further tasks arrive to queue 
Qi at the CS. Thus, the eventual state of the system when all tasks have been executed is (0,0, j, m), for some 
j £ J, m G A4, i.e., both queues will eventually be emptyt. In other words, all states of type (0,0, j, to) are 
terminal states for the system. Given the system state x at the beginning of a time slot, the MT has to choose one 
of two actions: 

1) FE: Fetch a task from the CS, or 

2) FE: do not fetch a task from the CS. 

Our objective is to determine the optimal policy, or the optimal sequence of actions (FE or FE) which drive the 
system from any given initial state to one of its terminal states, while incurring the lowest possible expected cost. 
We first formally define the notion of a policy. 

Definition 1: A policy 7r is a mapping n : Z + x Z + x J x A4 1— > {FE,FE}, which assigns one of the two 
possible actions (FE or FE) to each system state b. 

The possible state transitions along with the associated state transition probabilities when policy tt chooses action 
FE in state (61,62,^, to), assuming 62 > 0, are tabulated in Table I. Similarly, the possible state transitions and 

tThe case of stochastic task arrivals to the queue at the CS (Qi) can be easily studied in our framework, if the arrival process can be modeled 
as a discrete time Markov chain. For example, i.i.d. Bernoulli arrivals and correlated bursty arrivals fall into this category. Incorporating dynamic 
packet arrivals complicates the description and analysis of the system model, without providing significant additional insight into the system 
behavior. We therefore chose a "buffer draining model" for our exposition. 
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associated probabilities when it selects action FE in state (bx,b2,j,m), assuming 61,62 > 0, are listed in Table II. 
The state transitions are similarly described for the boundary cases b\ = 0, 62 > and b\ > 0, 62 = 0. 

E. Dynamic programming (DP) formulation 

Given the system dynamics in Section II-D, we are interested in computing the optimal policy 7r* , which minimizes 
the total expected cost incurred in reaching a terminal state (0,0,*,*), starting in any state x = (b,j, m). This is 
a stochastic shortest path (SSP) problem and is amenable to solution in a DP framework. Denoting by V(b, j, m) 
the value function, i.e., the expected cost incurred under the optimal policy 7r* in reaching a terminal state, starting 
in state (h,j, m), we know from the theory of dynamic programming that V(-) satisfies the following Bellman's 
equations V b%, 62 > 0: 

J M 

V(b,j,m) = min{^ P(j, j')Q(m, m')[p,(m)V(b - e 2 ,j',m') + p(m)V(b, j',m% 

i=l m=l 

E E Mj)»(™)V(b - e u f, m') + s(j)fx(m)V(b - e 2 , f, m')+ (L> 

j — 1 m— 1 

s(j>(m)F(b-ei + e 2 ,j' ,m') + s(j)fi(m)V(b J' ,m')]} + (c,b), 

where b = (fei, 62), ei = (1, 0), e 2 = (0, 1), c = (1, c), s(j) = 1 — s(j), fi(m) = 1 — /x(m), and (c, b) = bi + cb 2 . 
The first argument of min is the expected cost of choosing action FE in state x, while the second argument is the 
expected cost of choosing action FE. Similar equations capture the boundary conditions bi = or 62 = 0. Finally, 
we have V(0, 0, j, m) = V j, m, i.e., all terminal costs are associated with zero cost. 

III. A Special and Insightful Case 

The DP equations in (1) can be solved numerically to compute the optimal policy tt*. Well developed numerical 
techniques like value iteration and policy iteration are available to solve the DP equations in an efficient manner 
[9]. In principle, the equations can be solved offline, and the optimal action for every possible system state can be 
stored in a lookup table (LuT) at the MT. In an online implementation, the MT simply looks at the current system 
state (backlogs of Q\ and Q 2 , channel state, and processor state) and extracts the optimal action (FE or FE) from 
the LuT. Such an implementation is, however, fraught with the following difficulties: 

1) The DP formulation presented above assumes a fixed value of c. In practice, the MT may wish to update the 
value of c, based on its observation of the processor utilization, channel conditions, etc., to drive the system 
to a desired operating point. Recall that c captures the tradeoff between congestion and latency and hence 
determines the operating point of the two queue tandem. Thus, every time c changes, the DP equations in 
(1) need to solved again, which is a computationally cumbersome task. 



2) Even if computational complexity is not an issue, obtaining/estimating all the system parameters in real time 
is a non-trivial task. In particular, the MT may need an unacceptably long time to empirically estimate with 
sufficient accuracy the state transition matrices P and Q, associated with the wireless channel and processor 
utilization, respectively. In a realistic setting, the MT can at best hope to estimate the instantaneous probability 
of successful transmission and instantaneous task execution speed (or the corresponding values averaged over 
a moving window). 

Keeping the above implementation issues in mind, it is critical to devise a fetching algorithm which does not 
require frequent recomputation of the optimal policy from the DP equations, and also does not have unrealistic 
requirements in terms of the system parameters it needs to be cognizant of. To this end, we now turn our attention 
to a simple and insightful instance of the general system model described in Section II. 

In particular, we introduce the following two modeling reductions: 

1) Wireless Channel: The multi-state FSMC wireless channel is replaced by a two-state ON/OFF Bernoulli 
channel, which can be in ON state w.p. s and in OFF state w.p. 1 — s in very time slot, independent of its 
state in past and future time slots. When the channel is in ON state, a packet transmission over the channel is 
successful w.p. 1, and when the channel is in OFF state, a packet transmission over the channel fails w.p. 1. 
Using notation introduced in Section II-A, this i.i.d. channel model is characterized by state space J = {1} 
and state transition matrix P = [1]. Since \J\ = 1, the mapping s : J \— ► [0, 1] reduces to a scalar s. Note that 
under the i.i.d. channel model, packet transmission times are geometrically distributed with mean transmission 
time 1/s slots. 

2) Congestion at the MT: We assume that the expected time to process each task at the MT follows an i.i.d. 
geometric distribution with parameter fi. Again, this is a special case of the multi-state FSMC model used in 
the previous section to model time varying congestion at the MT. In terms of the notation used in Section II-B, 
the simplified model is associated with the state-space Ai = {1} and state transition matrix Q = [1]. Since 
task processing times are now characterized by a single geometric distribution, the mapping fj, : M. *—> [0, 1] 
reduces to the scalar \i. 

To summarize, the reduced system model is characterized by two key parameters: the probability of successful 
packet transmission over the wireless link from the CS to the MT (denoted s), and the average time needed to 
process each task at the MT (denoted Since these parameters are fixed for each instantiation of the reduced 

model (in contrast to the general model of Section II, where they were modeled as FSMCs with states denoted by 
j and to, respectively), they need not be included in the description of the system state. Thus, the state associated 
with the reduced system model is two dimensional (in contrast to the four dimensional state of the general model), 
and is denoted by b = (i>i, 62)- 
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The action space associated with the reduced model remains unchanged, i.e., each state is associated with one 
of two possible actions, FE or FE. The system dynamics can be described as in Section II-D. The value function 
now obeys the following Bellman's equations (obtained as a special case of (1), with J = 1 and M = 1): 

V(b) = min{ pV(b - e 2 ) + p,V(b), 
v ' 

Action FE 

spV(b - &i) + s^V(b - e 2 ) + spV(b - e 1 +e 2 ) + spV(b)} + (c, b), 

V * ' 

Action FE 

where s = 1 — s, p, = 1 — p,, and the rest of the notation is as defined in Section II-E. Finally, the terminal state 
associated with the above DP problem is (0, 0), with V(0, 0) = 0. 

Remark 3 (Advantages of model reduction): The modeling reductions introduced above help reduce the dimen- 
sionality of the problem from 4 to 2, which greatly facilitates analysis and algorithm design. Further, fetching 
algorithms designed on the basis of the reduced model do not suffer from the implementation hurdles discussed at 
the beginning of this section. We will revisit this claim in greater detail when we discuss algorithm design in the 
next section. Further, we will demonstrate via simulations that fetching algorithms based on the reduced model can 
closely match the performance of those based on the full fledged model of Section II. 

Even though the DP equations associated with the reduced model cannot be solved in closed form, numerous 
interesting structural properties of the optimal solution can be established, which provide useful insights and intuition 
about the decision tradeoffs inherent in the problem, in addition to aiding low complexity algorithm design. We 
prove one such structural property below and illustrate a few others via numerical examples. 

A. Switchover property 

We begin by providing the formal definition of a switchover type policy. 

Definition 2: A policy ir is of switchover type if there exists a non-decreasing switchover curve -0 : Z + i— > Z + 
such that 7r chooses action FE in state (61,62) if 62 > ipipi), and chooses action FE otherwise. 

A switchover policy splits the two dimensional state-space into two distinct decision regions, one corresponding 
to each of the actions FE and FE. The optimal policy of interest here, viz. it*, is of switchover type, in the sense 
of the above definition. 

Theorem 1: The optimal policy tt* is of switchover type. 

Proof: See Appendix VIII-A. ■ 

Remark 4 (A similar result): Theorem 1 is similar to a continuous time result proved by Rosberg et al. [11], 
where packet transmission times and task execution times were assumed to be exponentially distributed. However, 
the authors in [11] allowed for new task arrivals to Q\, and their objective was to compute the optimal policy 
which minimizes the infinite horizon discounted expected cost. 
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Fig. 2. Numerical Example 1 

Remark 5 (Load balancing effect): The switchover property of tt* is intuitively quite appealing. When b\ ;2> b^, 
i.e., the queue at the CS is much more loaded than the queue at the MT, the optimal action for the MT is FE. 
As the MT fetches more and more packets, b\ starts decreasing and starts increasing, and the optimal decision 
eventually switches over from FE to FE. Similarly, when 62 3> 61, the optimal action is FE. Thus, the MT stops 
fetching tasks for the CS and the size of Q2 relative to Qi starts diminishing, until eventually the optimal action 
switches over from FE to FE. Thus, the switchover nature of the optimal policy induces a load balancing effect 
between the two queues Qi and Q2. 

We conclude this section by demonstrating some more structural properties of the optimal policy 7r* via numerical 
examples. 

Numerical Example 1: This example illustrates the behavior of the optimal policy 7r* for different model param- 
eters. Fig. 2 depicts the optimal switchover curve (computed from (2)) for different combinations of s, fj,, and c. 
In particular, we consider the foUowing combinations: (0.6, 0.8, 1.2), (0.8, 0.8, 1.2), (0.6, 0.9, 1.2), and (0.6, 0.8, 
1.5). 

We make the following observations from the numerical examples: 

1) The decision region for action FE gets smaller as ,s increases, for fixed fi, c. As the wireless channel becomes 
more reliable, the MT can afford to fetch tasks from the CS less aggressively, since fewer attempts are needed 
on an average to fetch a packet successfully. 

2) The decision region for action FE grows bigger as fi increases, for fixed s, c. Since increasing /.t decreases 
average task execution times, the MT has to fetch tasks more frequently from the CS in order to drive down 
application latency. 

3) The decision region for action FE gets smaller as c increases, for fixed s, ji. A bigger c indicates a higher 
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congestion cost for the foreground application, forcing the MT to be conservative in fetching tasks from the 
CS. 

IV. Dynamic Task Fetching Algorithms 

In Section II, we developed a mathematical framework based on dynamic programming for studying dynamic task 
fetching algorithms for mobile computing scenarios. In Section III we briefly discussed an LuT based implementation 
of the optimal policy obtained by solving the DP equations. To surmount the practical difficulties associated with 
implementing the LuT based approach, we then turned our attention to a reduced but insightful version of the general 
model of Section II. The probability of successful packet transmission over the wireless link and the mean task 
execution time at the MT are fixed under the reduced model — assumptions which will clearly be violated in real 
life scenarios. Therefore, in this section we focus on designing task fetching algorithms for dynamic scenarios (i.e., 
time varying successful transmission probability and mean task execution times), while leveraging the simplicity 
of the static solution obtained from the analysis of the reduced model. We propose two algorithms: 

• Fetch-Or-Not (FON): This algorithm is based on the solution to the DP equations in (2). 

• Randomized Fetch-Or-Not (RFON): This algorithm is based on a randomized approximation (denoted by 
RAND) of the solution to the DP equations in (2). Thus, RFON does not actually need to solve any DP 
equations in real time. 

Both FON and RFON are based on the notion of quasi-static decision making. Broadly speaking, a quasi-static 
decision algorithm bases its decision at every decision epoch on the solution to a static instance/snapshot of a 
dynamic system, where the static instance is constructed based on the instantaneous operating point of the system. 
As the instantaneous operating point of the system evolves from one epoch to another (governed by the inherent 
system dynamics), the parameters of the static snapshots used by the quasi-static algorithm change accordingly. 
This approach was used to great effect for developing downlink wireless packet scheduling algorithms by Dua et al. 
[15]. In the context of this paper, the instantaneous operating point refers to the two tuple (s, fx), i.e., the probability 
of successful transmission and the task execution rate at the MT. 

A. FON and RFON 

The FON algorithm works as follows: Given the instantaneous estimates of s and p, in time slot t, denoted by s(t) 
and p(i), the DP equations associated with the reduced model, viz. (2), are solved with s = s(t),p = p(t), to select 
either decision FE or FE for the current backlog vector b(i) = (bi(t), b^t)). Based on the outcome of the decision 
(FE or FE), the backlog vector changes to b(t + 1) = (b±(t + 1), b%(t + 1)). Estimates of the success probability 
and mean task execution time are also updated to s(t + 1) and p,(t + 1) respectively, based on measurements made 
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by the MT*. 

The RFON algorithm is fairly similar to the FON algorithm, except that the decision in each time slot is based 
on a randomized approximation (RAND) to ir*, instead of it*. Thus, RFON has lower computational complexity 
than FON because, unlike FON, it does not need to solve the DP equations in (2) in every time slot. We will devote 
the next section to studying the randomized approximation RAND. 

Both FON and RFON algorithms are summarized in Table III. 

FON and RFON 

In time slot t, 
Given 

estimate of probability of successful transmission s(t), 
estimate of task execution rate p,(t) 
choice of congestion cost rate c(t), 
Compute 

n* from (2) (for FON) or RAND (for RFON) with s = s(t), fx = p,(t), and c = c(t) 
Select action FE/FE based on outcome of tt* (for FON) or RAND (for RFON) 
Update s(t) -> s(t + 1), fi{t) -> p,(t + 1), and c(t) -> c(t + 1) 

TABLE III 

Dynamic fetching algorithms: FON and RFON 

Remark 6 (c can be adapted to achieve a desired tradeoff): In addition to dynamically estimating s and /x, the 
MT can vary/adapt the tradeoff parameter c to achieve a desired tradeoff between application latency and local 
buffer congestion. Recall that under the system model of Section II (and its reduced version in Section III), c 
was assumed to be fixed. We will not consider algorithms for dynamically adapting c in this paper. Instead, we 
will simulate the performance of the algorithms FON and RFON with different (but fixed) values of c to "sweep" 
tradeoff curves for the system. 

V. RAND: A RANDOMIZED APPROXIMATION TO 7T* 

In this section, we develop a randomized approximation RAND to 7r*, based on the method of policy iteration 
[9]. Recall from Section IV that RAND is at the core of the dynamic fetching algorithm RFON. 

Our first step is to analyze two "extreme" policies (under the assumptions of the reduced model of Section III): 

1) tin'- Never fetches a task from the CS, until the buffer at the MT is empty, and 

2) Ha'- Continues to fetch tasks from the CS, until the buffer at the CS has been drained. 

A. The "never fetch" policy: ttn 

Consider policy ttn which never chooses to fetch a task from the CS, except when the buffer at the MT (queue 
Q2) is empty, i.e., ttn selects action FE in all states (61,62) with 62 > 0. We are interested in computing the 

+A discussion of the estimation algorithms used by the MT is beyond the scope of this paper. For ease of presentation, we will assume that 
the MT can estimate both parameters accurately. However, it is important to note that the MT can only estimate instantaneous values of these 
parameters and not the underlying stochastic processes which drive the system dynamics. 
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expected cost incurred under ir^ in reaching terminal state (0, 0), as a function of the initial state (bi, 62)- Denoting 
this cost by Cj\r(b), we have: 
Lemma 1: 



C N (h) 



1 



2 



c6| 
2fi 



1 (a 

2 \n 



61 



c6 2 
2/7' 



Proo/: See Appendix VIII-B. ■ 

B. The "always fetch " policy: it a 

Now, in contrast to ir^, consider a policy tta which always chooses to fetch a task from the CS if available, 
i.e., it chooses the action FE in all states b with b\ > 0. Again, we are interested in computing the expected cost 
incurred under tta in reaching terminal state (0,0), as a function of the initial state (61,62). We will denote the 
cost by Ca (b). It is not possible to compute Ca(^>) in closed form; therefore, we approximately compute CU(b) 
from a fluid caricature model. We denote the corresponding cost in the fluid model by Cj^(b). The attributes of 
the fluid model are described in Appendix VIII-C. We have the following result for the fluid caricature model: 

Lemma 2: 

c c— l\ 6? cbn cbibo m m 

L 1 2 1 — ; s > (i or s < 11 and T x < T 

A 4 

; s < (j, and T x > T , 



c »H v " k ' 2 4" 

2s + 2(/i - s) 



where Tq = — and T\ = — - — . 

s (i — s 

Proof: See Appendix VIII-C. ■ 
We are going to use C A {b) computed in Lemma 2 as an approximation to CU(b). We explore the efficacy of 
the approximation via a numerical example in Appendix VIII-C. 

C. Policy Iteration 

Policy iteration is a well known numerical technique for solving Bellman's equations [9]. Given a feasible policy 
7r, each iteration in policy iteration comprises of two steps: 

1) Policy evaluation: In this step, the expected cost incurred under policy 7r, denoted Vrr(b), is evaluated V b. 
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2) Policy improvement: In this step, the policy it is "improved" to obtain a new policy ir'. The improved policy 
7r' is computed as follows: 



7r'(b) = argmin{/iK-(b - e 2 ) + yuV^(b), 

Action FE 

sfj.V„(b - ei) + s/xK-(b - e 2 ) + s/iK(b - e x + e 2 ) + spK-(b)}, 



(3) 



Action FE 

where 7r'(b) denotes the action chosen by policy ir' (either FE or FE) in state b. 
The policy ir' is called a one step improvement of tt. We are interested in computing one step improvements of 
the policies ttn and tta, defined in Sections V-A and V-B, respectively. We have already performed the policy 
evaluation step for both policies. In particular, we have V nN (h) = Cjv(b) and V,r A (b) ps C{(b). 





/ipA 


} 


/ A>* 










Vi Tlx 



Fig. 3. Bounding the optimal switchover curve 



Recall that Cn and C* A are quadratic functions of b\ and b 2 . We now therefore compute the one step improvement 
of an arbitrary policy ir with a quadratic cost of the form: 



V„(b) = a x b\ + a 2 b\ + 76162 + Pih + /3 2 6 2 



(4) 



For convenience, define: 



V^b) 4 K(b + ei)-V»(b) 
V?(b) ^ ^(b + e 2 )-K(b) 



The policy improvement equation (3) can be rewritten as 



7r'(b) = argmin{0,SM[K 2 (b - e 2 ) - V^(b - e x )] + s[V*(b - ei) - V^b - ex)]} 



(5) 



15 



It easily follows from (4) that 



2a\b\ + 762 + a.\ + Pi 



761 + 2a 2 b 2 + a 2 + (3 2 . 



(6) 



Substituting (6) in (5) gives 



Tr'(b) 



arg min{0, £{h)}, 



(7) 



where 1(h) is a linear function of &i and b 2 . Note that the decision of tt' in state b is completely determined by the 
sign of ^(b). Since £(h) is linear (i.e., of the form a\bi + a 2 b 2 + 03 for some 01, a 2l 0.3 G R), the two dimensional 
state-space (61, b 2 ) gets split into two distinct decision regions by a straight line, corresponding to the two decisions 
FE and FE. In other words, policy tt' is of switchover type, in the sense of Definition 2. A little bit of algebra 
shows that a\ = 7 — 2a\, a 2 = 2a 2 — 7, and 0,3 = a.\ + (1 — 2/j,)a 2 — fix + /3 2 — (1 — 

Based on the above analysis and the expressions for cost functions derived in Lemma 1 and Lemma 2, we can 
compute the switchover curves for tt' a and ir' N , which are one step improvements of it a and ttn, respectively. We 
will refer to the switchover curves associated with ir' A and ir' N as ipA an d ipN, respectively. 

D. The RAND policy 

It can be argued by contradiction that ipA bounds tp* (the optimal switchover curve associated with tt*) from 
above and tp^ bounds tp* from below, as depicted in Fig. 3. Thus, roughly speaking, we have bounded the optimal 
switchover curve in a "conical" region defined by ipA an d ipN- The cone splits the state-space into three distinct 
regions — IZi, 1Z 2 , and IZ3, as shown in Fig. 3. In region 7l 2 , which lies above ipA, the optimal action is FE. 
In region 1Z\, which lies below ipN, the optimal action is FE. Region 71%, which is the interior of the cone, is a 
region of uncertainty. Our analysis tells us that tp* lies somewhere in region but not exactly where. 

We overcome the uncertainty in region 7^.3 by employing a randomized decision policy. In particular, consider a 
state b = (bi,b 2 ) £ TZ 2 . We know that 3 b' 2 > b 2 such that the state (bi,b 2 ) lies on the surface of the cone (on 
the switchover curve ipA) and the optimal decision is FE in all states (bi,y) with y > b' 2 . Similarly, we know that 
3 b 2 < b 2 such that the state (61,62) li es on tn e surface of the cone (on switchover curve tppf) and the optimal 
decision is FE in all states (61, y) with y < b 2 . If (61,62) is closer to ipA than t/j^, then the optimal decision is 
more likely to be FE, and if (61, 62) is closer to tp?? than ipA, then the optimal decision is more likely to be FE. 
In particular, for any state (61,62) £ IZ3, we will make a randomized decision based on the policy RAND, as 
described in Table IV. 

Numerical Example 2: This example illustrates the bounding of the optimal switchover curve ip* in a "conical" 
region generated by the switchover curves ipA and ip^, for two different sets of parameter values s, /i, and c. The 
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RAND 


In state (61,62), select action FE w.p. 




-b 2 

- b" 

Ul 



In state (61,62), select action FE w.p. — - 



b' 2 = argmin \y — 62 and 6 2 ' = argmin (62 — y\ 

(b 1 .y)eipA {bi,y)eipN 

TABLE IV 

RAND: A RANDOMIZED APPROXIMATION TO THE OPTIMAL POLICY IT*. 



results are depicted in Fig. 4. Observe that in both cases the conical region bounds ip* reasonably tightly. 
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Fig. 4. Numerical Example 2 



Remark 7 (RAND and RFON have low computational complexity): Note that the switchover curves tpA and ipjy, 
which bound the optimal switchover curve ip*, can be computed in closed form as functions of the system parameters 
s, n, and c by following the analysis in Section V-C. Consequently, the decision of the RAND policy can be obtained 
without explicitly solving any DP equations. Also, as numerical example 2 demonstrated, RAND provides a fairly 
good approximation to the optimal policy 7r*. Now recall that the decision of 7r* is used in every time slot by the 
FON algorithm in a quasi-static fashion. Similarly, RFON uses the decision of RAND in every time slot. However, 
RFON has substantially lower implementation complexity in contrast to FON because, unlike FON, it does not 
require the solution to a set of DP equations in every time slot. We will compare the performance of FON and 
RFON in dynamic scenarios (varying s and /1) via simulations in Section VI. 



VI. Performance Evaluation 

In this section, we evaluate the efficacy of the proposed fetching algorithms via simulations, and contrast it to 
benchmark algorithms. In particular, we simulate the following algorithms: 
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• OPT: The optimal fetching algorithm, computed by numerically solving the DP equations in (1). OPT is 
implemented via a lookup table, which is computed offline. 

• FON: A quasi-static algorithm, which makes a fetching decision based on a numerical solution to the DP 
equations for the reduced model, as given by (2). 

• RFON: Another quasi-static algorithm. RFON analytically bounds the optimal switchover curve for the reduced 
model (as computed by FON) and then constructs a randomized interpolation between the bounding curves to 
arrive at a fetching decision. 

• Always Fetch: A benchmark algorithm, which always chooses to fetch a packet from the CS, as long as the 
queue at the CS is non-empty. 

• Never Fetch: Another benchmark algorithm, which fetches a packet from the CS only when the queue at the 



Recall from our discussion in Section II-C that the choice of per packet buffering cost rate at the MT, denoted 
c, determines the congestion vs. latency tradeoff, and hence, the operating point for the system. We will use this 
tradeoff curve as a performance metric to contrast the performance of different algorithms. Each point on the tradeoff 
curve is described a two-tuple: (6| ve , d ave ). Here 6| ve is the average backlog of Q2 (the queue at the MT) and d ave 
is the average end-to-end delay per task§. Both metrics are computed by averaging over an entire simulation run. 
Note that &2 ave is a measure of congestion at the MT, while d ave is a measure of the overall latency experienced 
by a typical task in the system. As discussed earlier, a congestion vs. latency tradeoff curve can be generated by 
varying the per packet per unit time cost parameter c. Given this performance metric, a policy 7r is better than 
another policy n' if ir yields a lower average backlog b^ e than it' for a fixed average delay d ave , or a lower d ave 
for a fixed b^ s . Note that the always fetch and never fetch policies are oblivious to the cost parameter c, therefore 
for these policies, the entire tradeoff curve collapses to a single point. 

We evaluate the performance of the five algorithms listed above under a wide variety of operational regimes. 
For all scenarios considered here, the probability of successful transmission over the wireless channel from the CS 
to the MT and average task execution time at the MT are modeled as two state Markov chains (as described in 
Section II). The state transition matrix for the wireless channel state is denoted: 



MT is empty. 



P 



Pn 



1 —p n 



(8) 



1 ~P22 



P22 



§The end-to-end delay for a task is comprised of four components: the waiting time in the queue at the Cs, transmission time over the wireless 
channel from the CS to the MT, waiting time in the queue at the MT, and processing time at the MT once the processor at the MT is allocated 
to the task. 
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and the state transition matrix for the processor state is denoted: 

Q = • (9) 

1 - <?22 <722 

The success probabilities in the two possible channel states are denoted by si and sa, and the average task execution 
times in the two possible processor states are denoted by l/fii and I//12, respectively (equivalently, the task 
execution rates are (X\ and ^ 2 )- Without loss of generality, we assume s\ < s 2 and fii > fi 2 - Thus, the first channel 
state corresponds to a "good" channel and the second channel state corresponds to a "bad" channel. Similarly, the 
first processor state is one of "low" utilization, whereas the second processor state is one of "high" utilization. 
The probabilities pn and P22 determine the frequency at which the wireless channel switches between its "good" 
and "bad" states. Similarly, the probabilities qn and q 2 2 determine the rate at which the processor at the MT 
switches between states of "low" and "high" utilization. We define two more derived parameters: 6 S = S2 — Si and 
5^ = l//i2 — 1/Vi- A large S s implies that the channel can enter deep fades relative to its "good" state. A large 
<5 M implies that average task processing times go up significantly (relative to the "low" utilization state) when the 
processor enters a "high" utilization state. 

A variety of operational regimes can be envisioned, depending upon the values assumed by the parameters pn, 
P12, Q11, <Zi2> ^s> an d <V we demonstrate via simulations that the proposed algorithms FON and RFON yield tradeoff 
curves comparable to the optimal tradeoff curve generated by OPT, under a wide range of operating scenarios. For 
all simulations, we assume that Q\, the queue at the CS, has 20 tasks initially, and no new tasks arrive to this queue 
over a simulation run. We vary the cost parameter c from 1 to 100 to sweep the congestion vs. latency tradeoff 
curve. 



A. Slow fading 

For the slow fading scenario, we assume pn = P22 = P — 0.9. Thus, the expected sojourn time of the wireless 
channel in each state is 1/(1 —p) = 10 time slots. This essentially implies that each packet transmitted by the CS is 
likely to experience a static channel over all transmission attempts. This could well be the case in a mobile computing 
system deployed indoors, with static MTs and limited co-channel interference from other wireless networks. We 
fix qn — 0.5, <722 = 0.3. We assume that the two tuple (s±,S2) can take one of two possible values: (0.1,0.9) or 
(0.4, 0.5). For the former case, S s = 0.8, i.e., the channel can enter a deep fade relative to its "good" state. For the 
latter case, 6 S = 0.1, i.e., the channel is fairly static over time. Further, we assume that the two tuple (/ii, llj) can 
take one of two possible values: (0.9, 0.1) or (0.6, 0.3). For the former case, 5^ = 8.89, which implies that average 
task execution times vary significantly from one processor state to the other. For the latter case, 8^ = 1.67, which 
means that average task execution times are fairly constant over time. Thus, we have the following four subcases: 
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(S s = 0.8, 5,, = 8.89), (6 S = 0.1, 5^ = 1.67), (S s = 0.1, ^ = 8.89), and (S s = 0.8, ^ = 1.67). 

The congestion vs. latency tradeoff curves for the four cases are depicted in Figs. 5, 6, 7, and 8, respectively. 




2.4 2.45 2.5 2.55 2.6 2.65 

Average delay per task 



Fig. 5. Slow fading (p n = p 2 2 = 0.9), qu = 0.5, 922 = 0.3, S s = 0.8, and <5 M = 8.89. 




Average delay per task 

Fig. 6. Slow fading (pn = P22 = 0.9), qn = 0.5, 922 = 0.3, S B = 0.1, and <5 M = 1.67. 



B. Fast fading 

For the fast fading scenario, we assume pu = P22 = P = 0.1. Thus, the expected sojourn time of the wireless 
channel in each state is 1/(1 — p) = 1.1 time slots. The implication is that the channel state changes almost 
every time slot, so a task will experience a different channel state for every retransmission. This could be the 
case if the MT is mobile or operates in an environment where co-channel interference fluctuates rapidly relative 
to the dynamics of the system. Similar to the slow fading case, we consider four subcases, depending upon the 
magnitude of S s and 8^. As before, these four subcases are given by: (S s = 0.8, = 8.89), (<5 S = 0.1, 8^ = 1.67), 
(S s = 0.1,5^ = 8.89), and (S s = 0.8, <J M = 1.67). For all subcases, we fix q n = 0.5, q 22 = 0.3. 
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Fig. 7. Slow fading (p n = p 2 2 = 0.9), <jn = 0.5, 522 = 0.3, S s = 0.1, and <5 M = 8.89. 




Average delay per task 

Fig. 8. Slow fading (p n = p 2 2 = 0.9), <?n = 0.5, 522 = 0.3, S a = 0.8, and 5 M = 1.67. 

The congestion vs. latency tradeoff curves for the four cases are depicted in Figs. 9, 10, 11, and 12, respectively. 

For all eight operational regimes considered here, we observe that the tradeoff curves generated by FON and 
RFON are very similar to the tradeoff curve generated by OPT. Note that the optimal policy OPT is assumed to 
have a priori knowledge of all system parameters, viz., P, Q, (s±, S2), and (fix, 112)- In contrast, FON and RFON 
only assumed to know the instantaneous values of s and /i. These policies have no knowledge of the possible values 
either s or /1 can assume, or the underlying statistics (Markovian in our examples) which govern s and [i. The 
decisions of both FON and RFON are based entirely on the instantaneous operating point of the system through 
a quasi-static approximation, which assumes that the current operating point will persist forever in the future. All 
three policies have access to the backlogs of both Qi and Q?. 

As expected, OPT offers the best tradeoff in all operating scenarios. FON and RFON match the performance of 
OPT quite closely in slow fading conditions, and also when S s and 5^ are small. This is because the quasi-static 
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2.14 



Average delay per task 

Fig. 9. Fast fading (p n = p 2 2 = 0.1), <?n = 0.5, q 2 2 = 0.3, S s = 0.8, and £ M = 8.89. 
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Fig. 10. Fast fading (p n = p 2 2 = 0.1), <jn = 0.5, q 2 2 = 0.3, S s = 0.1, and c5 M = 1.67. 

approximation is quite accurate in conditions which fluctuate slowly, and do not vary much whenever they fluctuate. 
Consistent with this intuition, the biggest departure of FON and RFON from OPT is observed in fast fading, when 
S s and/or 6^ is large. 

Finally, note that the always fetch and never fetch policies appear as two extreme points on the optimal tradeoff 
curve. Thus, the always fetch policy seems to be optimal in the extreme regime where a large average congestion 
6| ve can be tolerated for a small latency <i ave . Similarly, the never fetch policy appears to be the best choice in the 
extreme regime where a small 6| ve is desired, even at the expense of a large d ave . However, none of these policies 
have the ability to provide a tradeoff between b^ e and rf ave . Also, it is important to note that the optimal tradeoff 
curve, as well as the tradeoff curves for FON and RFON are convex, which means that the tradeoff curve generated 
by time sharing (either randomly or deterministically) between the simplistic always fetch and never fetch policies 
will be strictly worse than the proposed policies. This argument justifies the small increase in complexity afforded 
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Average delay per task 

Fig. 11. Fast fading (p n = p 2 2 = 0.1), gu = 0.5, q 2 2 = 0.3, S a = 0.1, and <5 M = 8.89. 
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Fig. 12. Fast fading (p u = p 2 2 = 0.1), <jn = 0.5, q 2 2 = 0.3, 8 S = 0.8, and <5 M = 1.67. 
by FON and RFON (relative to the simplistic always fetch and never fetch) in order to enhance system performance. 

VII. Conclusions 

A gamut of mobile application pose diverse computation and processing requirements for battery power and 
memory constrained mobile devices. Judicious prefetching of computation tasks at the mobile terminals is thus 
important. Aggressive prefetching can result in congestion at the mobile device while lazy retrieval of these tasks can 
cause increase in latency thereby resulting in degraded application performance. We examine this buffering versus 
latency trade-off under varying wireless channel conditions and mobile terminal congestion states via dynamic 
programming methodology. We suggest quasi-static and randomized algorithms that alleviate the computational 
complexity of the dynamic programming solutions. Through evaluation experiments under slow and fast channel 
conditions we should that our low complexity heuristic prefetching algorithms come quite close in performance to 
the optimal solution. 
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VIII. Appendix 

A. Proof of Theorem 1 

We use value iteration and the principle of mathematical induction to prove the theorem. The DP equations in 
(2) can be solved by using the method of backward value iteration, where the estimate of the value function in 
state b at time n iteration, namely V n (h), is expressed in terms of the estimate of value function at time (n + 1), 
namely V n+1 (-) as: 

V n (b) = mm{nV n+1 (b - e 2 ) + fiV n+1 (b), sfiV n+1 (b - ei) + s(j,V n+1 (b - e 2 ) 

(10) 

+ sflV n+1 {b - ei + e 2 ) + sflV n+1 (b)} + (c, b), 
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along with the boundary conditions V N+1 (h) = V b, where TV is the length of the horizon over which the value 
iteration equations in (10) are solved. It is known that under certain conditions, V°(b) converges to the true value 
function V(h), which satisfies (2), as N — > oo. For guaranteed convergence, it is sufficient that 3 M S N, such that 
the system reaches the terminal state (0, 0) under any admissible policy within M steps, regardless of the initial 
state. This condition is clearly satisfied by our two queue tandem as long as s,/i > 0. 

Now, to establish the desired result, we will show that for any N, if the switchover property is satisfied at time 
(n + 1), then it is also satisfied at time n. Inductively, this implies that the switchover property is satisfied at time 
V N, and thus satisfied by the optimal policy it* in the limit N — > oo. 

Define, 

w n (b) 4 Sf i[V n+1 (h - e 2 ) - V n+1 (b - ei )] + sfl[V n+1 (h) - V n+1 (b -e x + e 2 )]. (11) 

It is easy to verify that the optimal decision at time n in state b is FE if uj n (h) < 0, and FE otherwise. 

We now fix TV and assume that the optimal policy at time n + 1 < N, denoted 7T* , 1 N satisfies the switchover 
property. From the definition of cj n+1 (b), it follows that the switchover property can equivalently be interpreted 
as follows: uj n+1 (bi, 62) is a non-decreasing function of b\ and a non-increasing function of 62- Based on this 
interpretation, we want to show that co n (h) is a non-decreasing function of b\ and a non-increasing function of hi, 
i.e., 7r* N also satisfies the switchover property. 

Note that the optimality of the switchover property at time (?i + 1) implies that the optimal policy 7r* +1 N splits 
the state space, i.e., the (bi, 62) plane into two distinct decision regions, corresponding to the two possible decisions 
FE and FE, respectively. 

Now, it follows from (11) that w"(b) is a function of V n+1 (b - e x ), V n+1 (b - e 2 ), V™ +1 (b), and V n+1 (b - 
ei + e 2 ). By the same token, w"(b + ei) is a function of V n+1 (h), V n+1 (h + ei - e 2 ), V n+1 {h + e x ), and 
V n+1 (h + e2). Thus, to show that Lu n (h + ei) > cj n (b), we need to consider numerous cases, depending on the 
optimal decision at time (n + 1) in the states of interest listed above. In the interest of space, we are only going 
to consider two representative cases here: (i) all states of interest lie in the decision region corresponding to action 
FE, and (ii) all states of interest lie in the decision region corresponding to action FE. All other cases where the 
boundary between the decision regions splits the states of interest into two sets can be treated as a combination 
of the two representative cases. Also, we will only focus on establishing the monotonicity of w n (6i,6 2 ) in its first 
argument. The proof of monotonicity of a/ 1 (61,62) in its second argument follows analogously. 

Case 1: We assume that the optimal decision at time (n + 1) is FE in the following set of states: X = 
{h, b ± ei, b ± e2, b + ei — e?, b — ei + e2}. Thus for any state x G X, we have 

^ n+1 (x) =/iV" +1 (x-e 2 )+^" +1 (x) + (c,x) Vxe*. (12) 
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For convenience, define 

A"(b)^V m (b-e 2 )-V m (b-e 1 ). (13) 

If follows from (11) and (13) that 

u n (b) = s^A n+1 (b) + sfiA n+1 (b + ea). (14) 

From (12) and (13) it follows: 

A" +1 (b) = ^A"+ 2 (b-e 2 )+/2A"+ 2 (b) + (c,e 1 -e 2 ) 
A" +1 (b + e 2 ) = ^A"+ 2 (b)+/iA"+ 2 (b + e 2 ) + (c,ei-e 2 ) (15) 

Substituting (15) in (14) yields 

w n (b) = n{ Sf iA n+2 (h - e 2 ) + s/2A n+2 (b)] 

V v ' 

=w"+i(b-e 2 ) from (14) 

(16) 

+ fL[sfiA n+2 {b) + s/2A" +2 (b + e 2 )] + s(c, ei - e 2 ), 

V v ' 

=w" + 1 (b) from (14) 

implying oj n (b) = /iw" +1 (b-e 2 ) + ^" +1 (b) + s(l-c). By the same token, w n (b + ei) = ^w n+1 (b + ei -e 2 ) + 
j2uj n+1 (b + ei) + s(l — c). It now easily follows from our inductive assumption (u> n+1 (bi, 6 2 ) is a non-decreasing 
function of 61) that uj n (b + ei) > ui n (b), as desired. 

Case 2: We now assume that the optimal decision in all states in the set X (as defined for Case 1) at time 
(n + 1) is FE. Thus, for any xC^fwe have 

V n+1 (x) = sfiV n+2 (x - ei) + sfxV n+2 (x - ea) + s/2^" +2 (x - ei + e 2 ) + s/2l/"+ 2 (x) + (c,x). (17) 

Following the definition of A"(b) in (13), we have 

A n+1 (b) = sfxA n+2 {b - e x ) + s/iA" +2 (b - e 2 ) + sjlA n+2 (b - e x + e 2 ) + sflA n+2 (b) + (c,ei - e 2 ). (18) 

A" +1 (b+e 2 ) = s A iA n+2 (b-e 1 +e 2 )+s A iA™ +2 (b)+s^A™+ 2 (b-e 1 + 2e 2 )+^A™ +2 (b+e 2 ) + (c,e 1 -e 2 ). (19) 

Substituting (18) and (19) in (14) and re-arrangement of terms yields 

u n {b) = Sf iLu n+1 (b - ei) + s/Jtw n+1 (l> - e 2 ) + sfluj n+1 {b - ei + e 2 ) + s/2u; n+1 (b) + s(c, ei - e 2 ). (20) 

By the same token, 

w n (b + ei) = s f iuj n+1 (b) + s f iuj n+1 {b + e 1 - e 2 ) + sfko n+1 (b + e 2 ) + sp,u n+1 (b + ei) + s(c, ei -e 2 ). (21) 
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It now easily follows from (20), (21), and our inductive assumption (ui n+1 (bi, 62) is a non-decreasing function of 
hi) that Lo n (h + ei) > w"(b), as desired. 

All other cases not covered by case 1 and case 2 above are of the type: X = X?e U Apg-, Ape H Apg, where the 
optimal decision at time (n + 1) in state x G X is FE if x G Afe, and FE if x G A"^. These cases can be treated as 
a "combination" of two extreme cases X FE = (Case 1) and Xp^ = (Case 2) treated above. We skip the details 
for brevity. Finally, the proof for u n (h + e 2 ) < ui n (h) is analogous to the proof above. 

It thus follows from the principle of mathematical induction that for any N G N, u) n (bi, b 2 ) is a non-decreasing 
function of 61 and a non-increasing function of 62 Vn = 0, . . . , N. The equivalence between the monotonicity 
of w™(b) and the optimality of a switchover policy implies that the optimal policy at each time n, namely 7r* N , 
satisfies the switchover property, for any fixed N. Finally, it follows from the convergence of the value iteration 
algorithm that the optimal policy ir*, which satisfies the Bellman's equations in (2) is of switchover type, as claimed. 



B. Proof of Lemma 1 

From the system dynamics described in Section II-D, it follows that 



C N (h) 



fiC N (b-e 2 ) + flC N (b) + (c,b) ; b 2 > 

s/xCAr(b - ei) + s/iCAr(b - ei + e 2 ) + sCV(b) + (c, b) ; b 2 = 0. 

Rearranging and combining terms we get 

(c,b) 



C N (h) 



C N (h-e 2 ) + 



/' 



C N {b x - 1, 0) + I - + 61 -( (C -^ 



/i s 

1 fji , 1\ , (c-l)p 



2 



/' 



b\ and 



c N (b) = c N (b 1 ,o) + b ^ + c ^±^, 



b 2 > 

^2=0, 



which is the desired result. 



C. Approximating C'a (b) Mi/ng a ^m/c/ caricature model 

1) A fluid caricature model: The fluid caricature model mimics the "mean behavior" of the time slotted, packet 
based model. The key attributes of the fluid caricature model are: 

• Qi and Q 2 buffer infinitesimally divisible "fluids". 

• Time is continuous. 

• Fluid flows at a constant rate s from Q\ to Q 2 and flows out of Q 2 at a constant rate p. 



• A backlog cost at unit rate for every unit of fluid is incurred at Qi, and a congestion cost at a rate c for every 
unit of fluid is incurred at Q 2 . 
Similar to the time slotted model, no fluid arrives to Qi after time t = 0. 
2) Proof of Lemma 2: We need to consider two distinct cases: 

1) s > p: Since s > p, Qi drains faster than Q 2 . Denoting the initial amount of fluid in Qi and Q 2 by b\ 
and b 2 respectively, Q\ first becomes empty at time Tq = b\/s and stays empty thereafter. For t < To, the 

amount of fluid in Q\ at time t is given by b\ — st. Thus, the total backlog cost incurred at Q\ over the 

f T ° b 2 

interval [0,To] is / (61 — st) dt = — . Over the same interval, the amount of fluid in Q2 at time t is 
Jo 2s 

given 62 + (s — /i)t. Thus, the total congestion cost incurred at Q 2 over [0, To] is / c(b 2 + (s — p)t) dt = 

2 

+ (l- H\ Now, the amount of fluid in Q 2 at time T is B 2 (T ) = b 2 + h ( 1 - -). Thus, Q 2 

s 2s V s J \ s J 

B' (T 1 

drains completely at time Tq = To H and the amount of fluid in Q 2 at time t for t G [Tq , Tq] is given 

/i 

by B 2 {To) — fj,(t — To). Consequently, the congestion cost incurred over the interval [Tq,Tq] at Q 2 is given 



by / c{B 2 {T ) - [i{t - To)) dt = -- 62 + 61 ( 1 - - ) . The total cost C f A (b) for the case s > fx is 
given by the sum of the three costs computed above. 

2) s < /x: We need to consider two further sub-cases. To this end, define T\ = b 2 /([i — s) and let T = b\/s, 

as before. If Ti < To, then Q 2 drains before Q\. The backlog cost incurred at Q\ over the interval [0,Ti] 

f Tl sT 2 f Tl 

is / (61 — st) dt = b\Ti — and the corresponding congestion cost incurred at Q 2 is / c(b 2 — (/i — 

Jo ' 2 Jo 

( (u-s)T 2 \ 

s)t) dt = c I 62T - — — - 1 J . The amount of fluid in Q 1 at the end of the interval is Bi(Ti) = 61 - sTl. 

/■To sfT T } 2 

An additional backlog cost of / (Bi(T - 1) - st) = B x {Ti){T Q - T) - V U is incurred over 

Jt x 2 

the interval [T, To] at Q\. The cost incurred at Q 2 over this interval is neglig ible. Thus, C f A {h) for the case 
s < /.i, T\ > To is given by the sum of the three costs computed above. 

If T\ < Tq, Q 2 drains before Q\. The computation of C A in this case is identical to the case s > [i considered 
above. 

3) "Goodness" of approximation: In this section, we explore the efficacy of C A {h) as an approximation to 
671(b) via a numerical example. 

Numerical Example 3: This example illustrates the goodness of C A (h) as an approximation to CA(b). The left 
side of Fig. 13 shows the fractional approximation error as a function of b 2 for three different values of b\ for the 
case s = 0.8, p = 0.6 (s > p). The right side of the figure depicts the same plots for the case s — 0.6, fi = 0.8 
(s < /i). Observe that the accuracy of the approximation increases as b\ and b 2 increase. The relative error is below 
5% for moderately large values of b\ and b 2 . The error is as much as 30% for small values of b\ and b 2 . For these 
cases, however, CU(b) can be computed exactly, with only a few computations. The "kinks" in the plot on the 



Fig. 13. Numerical example 3 



right correspond to the points at which T\ > To exceeds To (as defined in Lemma 2). Note that To is fixed on each 
curve, since b\ is fixed on each curve. Further, T\ increases linearly with 62 on each curve. For fixed b\,T\> To 
implies 62 > (/V s — l)&i> which is always satisfied for fi < s (left side of the figure). When /j, > s, with the values 
chosen in this example (/1 = 0.8, s = 0.6), the condition reduces to &2 > i>i/3 (determining the points at which 
the kinks appear in the plot on the right side of the figure). 



